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ABSTRACT 


Title of Thesis: Time and Source Encoding for Multiplexed Compressed Signals 
Carlo J. Broglio, Master of Science, 1969 

Thesis directed by: Associate Professor Dr. Alan B. Marcovltz 

Sensor data transmitted from a spacecraft to the ground contain much re- 
dundancy. Efforts to remove this redundancy, called data compression, tend to 
be in the form of polynomial predictors in which the data are "curve fitted" to the 
longest straight line within a prescribed error tolerance. Once compressed, a 
major problem in reconstructing the data is that of identifying the time of occur- 
rence of each data point. 

This thesis proposes five methods for identifying the time of occurrence of 
a data point. Two methods are simulated on a CDC 3200 computer and the other 
three are calculated by using a probability of occurrence table measured by ap- 
plying zero-order and linear predictors to spacecraft data taken from the Orbiting 
Geophysical Observatory B satellite. A noise-free environment and a prescribed 
error tolerance were assumed in the process of obtaining the probability of oc- 
currence table. These proposed methods are analyzed and compared by the use 
of information theory and the results are presented in this Thesis. 
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CHAPTER I 


INTRODUCTION 

With the advent of satellites, the volume of data obtained and processed each 
year has increased enormously. For example, one Orbiting Ground Observatory 
(OGO) transmits to earth over 5 billion binary digits per day, thereby overload- 
ing the satellite -to-earth data channel. Consequently, it has become necessary 
to reduce the data bandwidth necessary for transmission. 

One method under consideration is data compression— the removal of re- 
dundancy from the data. Redundancy occurs in many forms; natural redundancy 
is that redundancy which is intrinsic to the signal being observed. An example 
of this is the measurement of temperature, voltage, or current. These param- 
eters tend to remain constant over long periods of time. 

Forced redundancy is inherent in the design of the spacecraft. Examples of 
this kind of redundancy are oversampling, subcommutation identification, and 
spacecraft clocks. 

Correlation redundancy arises because of its space relationship to other 
samples. An example of this is television picture data. If one considers the 
picture an(nxm) matrix, each interior point is related to the eight samples 
surrounding it. This type of redundancy is beyond the scope of this paper. 
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When data compression is used in a telemetry system, the following problems 
are encountered: 

1. How to remove the redundancy, 

2. How to control the errOi. introduced by compression, 

3. How to identify the time of occurrence of the data received, 

4. How to transmit at nonuniform data rates, 

5. How to recover the data when errors are made during transmission. 
Furthermore, in a multiplexed telemetry system, there is the additional problem 
of identifying the sensor that produced the received data quantity. 

The problem of redundancy removal has been studied on a one -source -one - 
output basis in the past (References 1 through 10). Some of the solutions con- 
sidered in the past are polynomial prediction, polynomial interpolation, func- 
tional curve fitting, bit plane encoding, and depiction of data in a periodic manner. 

The problem of controlling the introduced error must be solved by the user 
of the telemetry compression system since he is the one who best knows the 
limitations of the data. The problem of identifying the time of occurrence of the 
received data sample has also been studied in the past. Some of the solutions 
have been run -length encoding, ordering the raw telemetry samples, and number- 
ing the raw telemetry samples. 

The nonuniform rate of the compressed data stream (known as the buffering 
problem) has been the subject of many previous studies. It has been considered 
mainly a design problem in feedback control system theory (References 2, 3, 
and 4) . The recovery of data after transmission is a problem of error detection 
and correction coding theory. 
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The problem of identifying sources in a multiplexed data stream has re 
ceived very little attention iln the past. The solutions proposed so far tend to 
treat this subject as a time identification problem. Furthermore, the problem 
of whether to compress the data and then multiplex it, or to multiplex and then 
compress, has not been investigated. 

The object of this paper is to study the problems of redundancy removal, 
time sequence identification, and sensor identification with the main emphasis on 
the last two problems. The problem of buffering is not considered because it 
can be treated better in feedback control theory and the problem of transmission 
errors is not considered because it is handled best within the context of error 
correction coding; these two theories are beyond the scope of this paper. The 
previously mentioned solutions do not represent a complete list of the possible 
methods considered, but are intended as representative in each problem study 
area. 

The problem of redundancy removal is approached using zero-order and 
first-order polynomial predictors. These methods were chosen because past 
studies (References 2 through 4) have shown them to be the most promising 
methods when applied to the sensor data used throughout this study. 

The problem of introducing errors by compression was assumed solved by 
allowing a maximum average error rate of one quantization level. This assump- 
tion is meant to be interpreted as a presentation of results at this error level 
rather than as a solution to the problem of error control. 

Five methods of coding time and sensor identification are compared. This 
paper differs from past work in that it considers a multiplexed data stream, 


suggests a method of data identification based on an existing data processing 
system, and is based on actual data from the OGO-B spacecraft, now orbiting 
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the earth. 

The following considerations are of major importance: simplicity in space- 
craft design, minimum changes in the current ground recovery hardware, and 
minimum changes in current data processing. These considerations are neces- 
sary to reduce the cost of converting to a data compression system so that the 
advantages of data compression will not be overcome by the cost of data recovery. 

It is assumed that: 

1. The data are in a noise-free environment, 

2. The compressor is to be flown on the spacecraft, 

3. Data from many experiments are to be multiplexed before transmission, 

4. The experiments are mutually independent. 

The assumption of a noise-free environment is perhaps unrealistic; how- 
ever, the nature of the source is best studied under this condition, and it is 
hoped that the results of this study can be used to solve the noise problem, per- 
haps through coding. 

Two types of compression algorithms were used—the zero-order predictor 
(ZOP) and the linear predictor (LP). A detailed description of their operation 
appears later. These two algorithms were applied to data from the OGO-B space- 
craft, and the results were evaluated to determine a combination that would allow 
high compression with reasonable error rates. After completion of this phase 
of the study, attention was focussed on time and sensor identification. Two 
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methods of time identification were simulated on the CDC 3200 computer, and 
three others were compared theoretically. The methods simulated on the com- 
puter encode the number of samples skipped within the frame matrix as identifi- 
cation, whereas the theoretical methods encode each position of the frame mati 
in one case and each source in the other. In all cases, the frame synchroniza- 
tion code is assumed to be transmitted uncompressed. These methods will be 
discussed in detail later. 

The framci matrix was maintained throughout this study because of the sim- 
plicity in applying orbit data to the frame sequence to determine the exact po- 
sition of the spacecraft at the time the data were transmitted. 


CHAPTER II 


THEORETICAL BACKGROUND 
Telemetry 

For the purpose of the present research, the telemetry system is a time- 
sampled digital system with a quantization precision set by a k-bit binary code. 
The data source is assumed to be some analog function of time which is sampled 
at a constant rate. 

In a time -multiplexed telemetry system, different sources are sampled in 
a definite order o i intermixed, time -shared sampling called commutation. The 
commutator may be mechanical or electronic, and its operation may either be 
inflexible (according to a predetermined pattern) , changeable (according to dif- 
ferent patterns, each brought about by ground command) or automatic, depending 
on the data. The output of a commutator is a sequence of samples from dif- 
ferent sources, the pattern of samples repeating in some time period which is 
typically large in comparison to the time period between samples. 

When a time -multiplexed telemetry signal is received, the samples are 
"sorted” into sources from the time -multiplexed sequence. This process, 
called decommutation , depends on frame synchronization and word synchronization 
in the multiplexed data sequence for reliable operation. Word synchronization 
uses an easily recognizable "sync" word that is inserted in the sampling sequence 
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once every period or fractional period of the multiplexed sequence. The details 
of a simple, multiplexed pattern with its included word synchronization are il- 
lustrated in Figure 1. 

In Figure 1, the letters A through L signify data sources such as space ex- 
periments, attitude sensors, spacecraft subsystem parameters (voltages, cur- 
rents, temperatures, etc. associated with the spacecraft and not individual ex- 
periments), and an on-board clock. The subcommutator count (SCC) word has 
its lowest value at the beginning of the longest repetitive cycle in the multiplexed 
data format and has its highest value at the end of this longest cycle, which is 
called the ’’main frame." In Figure 1, the main frame constitutes eight rows of 
the pattern; two main frames are shown. Each row is called a minor frame, and 
the synchronization word, called a frame sync pattern (FSP), occurs once each 
minor frame. Source A, occurring three times each minor frame, is said to be 
super commutated , that is, sampled more than once each minor frame. Source B 
is sampled once every minor frame. Sources C and B are sampled once every 
other minor frame, and these are subcommutated— sampled less than once each 
minor frame. Sources D through L are also subcommutated, and since their sub- 
commutation pattern has the longest period, this pattern determines the sub- 
commutator count. 

Usually each major and minor frame are much larger than in this example. 
For example, the OGO-B telemetry format (Reference 11) has a (128 x 128) major 
frame matrix, that is, 128 minor frames, each with 128 samples. 
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Figure 1 - A time-multiplexed telemetry pattern. 
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Data Processing Techniques 

After the data are time -multiplexed, they are transmitted to the ground 
tracking stations and recorded on tape. The first stage of processing is to obtain 
"bit sync" and "frame sync" from the recorded bit stream. For this purpose, 
the computer later uses analog-to-digital processors. The output of these proc- 
essors is called a "buffer tape'* that contains the information transmitted to the 
ground in a computer-acceptable format. 

The computer analyzes the FSP to determine the data error quality, thereby 
providing data quality control. The computer also merges the data with orbit 
position data calculated from the spacecraft clock and orbit projection programs. 
These data are then stored on tape in the data archives. This tape, called an 
edit tape , is the input to the next stage of processing, decommutation. In decom- 
mutation and subsequent data processing, it is necessary only to locate absolute 
time as a reference point somewhere within a main frame: the absolute times 
of all samples can then be derived from their positions in the main frame (see 
Figure 8a). 

Data Compression 

There are several methods for compressing data (Reference 3). Perhaps 
the simplest is polynomial prediction in which an n-th order polynomial is gen- 
erated by the compressor using (n + 1) consecutive samples. The next sample 
is derived by evaluating the polynomial. This prediction is then compared with 
the current data value. If the current data value is within ±K of the predicted 
value, the sample is not transmitted. The value of K is a parameter of the 
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compression method and is chosen by the experimenter, based upon the accept- 
able peak error. 

The simplest of these methods is the zero-order predictor (ZOP). The 
operation of the ZOP is based on predictions of future samples using a horizontal 
projection of a zero-order polynomial from the present sample (Reference 4). 
This method simply adds (or subtracts) the K value, which establishes a peak 
error, to the present sample. As long as subsequent samples fall within this 
range, they are considered redundant and are not transmitted (Figure 2). The 
value of the subsequent sample is then assumed by the receiver to be the same 
as the present sample and to fall on a horizontal line projected through the 
sample. When a future sample falls outside this range, it is transmitted as a 
nonredundant sample. The K value is then set around the new sample, and the 
process is repeated. 

The second method to be considered is linear prediction (Reference 1). The 
linear predictor (LP) uses a first-order extrapolation polynomial of the form 

y t = y,-i + (y<n‘y«-i-i) (*)« . 

where y t _. is the last sample sent and y t _._j is the value prior to y t _. assumed 
by the receiver. Thus, if the previous sample was not transmitted, the pre- 
dicted value of y t _._j is used. 

The extrapolation equation is a straight line drawn between the last two data 
points. Initially, the first two data points are transmitted, and a straight line is 
drawn through them. An aperture of width 2K is placed about the straight line 
(Figure 3). If the new data point is within ±K of the predicted value, then that 
point is not transmitted. If the new data point is outside the aperture, then that 
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Figure 2 - Zero-order predictor compressed waveform 
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point is transmitted, and a new prediction line is drawn through the present data 
point that was transmitted and the previously predicted data point. 

Assigning a tolerance value K to the compression algorithm subjects the 
quantized waveform to error in reconstruction. Assume that the value of K is 3 
(Figures 2 and 3); then any sample that falls within three or less counts of the 
predicted value would not be sent. Thus, an error is introduced into the trans- 
mitted data value (Reference 3). 

The methods of data compression mentioned previously are by no means a 
complete list of possibilities (see References 12 through 21). They were chosen 
because of their ease of implementation and data reconstruction. These methods 
offer the most promise when the impact on ground data processing is considered. 

When data compression is used, the continuity of the input data waveform is 
lost. Hence, a requirement to reestablish the space relationship of the data is 
imposed on the compressed data stream. This requirement is satisfied by the 
use of a time identification encoding scheme. The choice of such a scheme fixes 
the cost of identifying the data. 

When a multiplexed data source is considered, the problem increases because 
now not only the continuity of the data source must be reconstructed, but also the 
source itself must be identified. Figure 4 shows the telemetry pattern in Fig- 
ure 1 after compression. The blank slots represent missing data points caused 
by compression. 

In minor frame 1, some means of identifying the fact that word 6 is missing 
is necessary. This may be done by identifying the word position number of each 
data word sent, by identifying the source of each data word sent, or by ordering 
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the data words and identifying the number of words skipped since the last trans- 
mitted sample. 

Thus, each word sent could have a 3-bit prefix representing a binary number 
of its position in the minor frame (for example, word 4 of minor frame 1 would 
be 100A), or it could have a 4-bit source identification followed by a 2-bit repeti- 
tion factor where necessary: That is, each of the 12 data sources A through L 
could be represented by a unique 4-bft pattern, and for sources appearing more 
than once (such as A) , the number of the output for that source in that frame 
could appear. For example, word 8, frame 1, could be prefixed by 000111, where 
0001 represents source A, and 11 represents the third output of source A in 
frame 1. 

Finally, the number of words since the last transmitted data word could be 
encoded into a binary number of fixed length such as four bits, and sent with the 
next transmitted sample. For instance, in the first appearance of minor frame 8, 
the telemetry word 7 that is transmitted could be prefixed by 0101. These methods 


and others will be studied in more detail later. 


CHAPTER III 


THEORY OF OPERATION 

Two terms necessary for comparing compression algorithms are raw com- 
pression ratio and actual compression ratio. The raw compression ratio is the 
number of data bits transmitted divided by the number of data bits in the uncom- 
pressed data stream. The actual compression ratio is the number of data bits 
plus identification bits transmitted divided by the number of data bits in the un- 
compressed data stream. The second value includes the cost of identifying the 
data. 

Five compression schemes will be considered. In all five cases, a set of 
samples xi lf xi 2 , xi 3 , • • • xi n in a minor frame is sent; that is, 128 samples 
of the frame. They are ordered such that <...<i <128. 

In the first three schemes A, B, and C, advantage is taken of the ordering; 
in the last two schemes D and E, it is not. Hence, from information theory, one 
would expect schemes A, B, and C to be better, since schemes D and E have a 
larger range to encode than the other three. That is, it is more probable that 
some of the 128 words will appear rather than that all 128 will be skipped. This 
hypothesis will be examined later. 

First, the two compression schemes that were submitted to computer simu- 
lation will be examined. Both schemes employ the idea of minor frames and word 
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position in tbe frame in data recovery, as is presently done. Both attempt to 
identify the time of occurrence in the spacecraft multiplexer rather than at the 
experiment sensor. Time information is recovered by transmitting every FSP. 
Also, both methods employ a parameter T which allows T-compressed major 
frames to elapse, then transmits a full major frame of uncompressed data, which 
allows all measurements to be reestablished without error. In the case of con- 
stants such as voltages and currents, which would be absent for hours, T allows 
a periodic check on their value. Another use of the parameter T is to increase 
the redundancy when the noise level of the spacecraft -to -ground communication 
channel is increased because of electrical storms, solar activity, etc. 

In the computer simulation, a 7-bit word was inserted after the !'SP to 
represent the number of words appearing before the next FSP. This required 
the spacecraft to store an entire frame of data before transmission and minimized 
the nee.* for ground equipment modification since it enabled the equipment to 
predict the time of occurrence of the next FSP. This removes some of the 
burden of data error control from the computer since it can be preprocessed to 
a limited extent in the frame synchronizer (Reference 22) . Since this 7-bit word 
could be added to any of the schemes to be discussed, it will be omitted from the 
analysis of the schemes. 

The first scheme (A) is designed for minimal analog-to-digital equipment 
modification. Each transmitted data word in scheme A comprises a 3 —bit identi- 
fication section, a 0- to 7-bit tag, and a 9-bit data value. The 9-bit value is the 
level of quantization used by QGO-B (Reference 23). 
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The 3-bit identifier represen ts a binary count of the number of bits in the 
variable-length section following it. This enables the data processor to count the 
number of bits in a word and thereby separate data words (see Figure 5). This 
method was chosen because it represents only a slight change in the current data 
processors that are designed for uncompressed data streams. 

The 0- to 7-bit tag represents the number of data words missing from the 
minor frame format since the previous data word. This enables the computer 
to reconstruct the frame format; further processing may then be done with no 
change in existing programs. Hence, time and source identification are accom- 
plished in the same way as they are now done. 

The second scheme (B) is similar to scheme A except that the word identifi- 
cation is modified. The transmitted data word is now composed of a 1- or 4-bit 
identifier, a 0- to 6-bit tag and a 9-bit data value. The first bit of the identifier 
represents whether or not any data words have been omitted from the minor frame 
format. If its value is zero, then no words are absent. If it is a one, however, 
the next three bits represent the number of bits in the tag section and also the 
scale factor for the tag section. 

The 0- to 6-bit tag section represents a binary value that must be added to 
2 raised to the power represented by the last thre- bits of the identifier to obtain 
the number of words skipped. For example, if one word is skipped, the data 
identifier is 1000; for two words, the data identifier is 10010; and, for 125 words, 
the data identifier is 1110111101. Analyzing the last case reveals: 1 signifies 
that some word(s) has been skipped; 110 signifies that six bits are in the tag and 
that the value of the tag must be added to 2 6 , or 64. The 6-bit tag 111101 is 
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decoded as 1 * 2 s + 1 • 2 4 + 1 • 2 3 + 1 • 2 2 + 0 * 2 1 + 1 • 2° =61. Thus, 61 + 64 
= 125 words are skipped (see Figure 6). 

The implementation of these schemes is exemplified in Figures 7 and 8. In 
Figure 7, the dashed-line enclosure represents the additional amount of design 
and development needed in the spacecraft. The dashed line encloses a large 
portion of the control box, signifying the necessity for increased complexity in 
this area, and the portion outside the dashed line corresponds to the current 
design. 

Figure 8a represents the current methods for recovery of the transmitted 
data; Figure 8b shows the increased effort necessary to handle compressed data 
in the current system. The new box represents merely a subprogram added to 
the current buffer tape processing programs. Neither scheme will require ad- 
ditional equipment at the receiving station. However, both schemes require minor 
modifications of the current analog -to -digital data processors. 

A theoretical calculation of the worst case, that is, all words of a minor frame 
present, shows the following actual compression ratios: C A = 0.767 andC B = 0.928, 
where C A refers to scheme A and refers to scheme B. These calculations were 
based on the OGO-B spacecraft format, which was used throughout the study. 

These schemes are optimal when the probability of not skipping a word is 0„5 for 
scheme B and 0.125 for scheme A and for both schemes when the probability of skip- 
ping one word is 1/16; skipping two or three words is 1/32; skipping four to seven 
words is 1/64; skipping eight to 15 words is 1/128; skipping 16 to 31 words is 
1/256; and skipping 64 to 125 words is 1/1024. Computation of these probabilities 
showed, however, that the foregoing schemes were not optimal for the data source 
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Figure 6 — Time identification codes for schemes A and B 
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Figure 7 - A data compression system 
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Figure 8 - Data processing methods 
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used. The results of this computation appear in Table A4. Since the source was 
neither of these forms, scheme C was derived as an optimal code by using the 
Huffman method (Reference 24). The probabilities used were derived from the 
word compression ratios measured by the computer for schemes A and B. Hence, 
scheme C represents an optimal code for methods A and B. Scheme C was not 
simulated on the computer; its derivation appears in Appendix A. Figure 9a 
shows a frame format using scheme C as the time identification algorithm. 

The last two theoretical methods, which were not simulated on the computer, 
as mentioned earlier, will be described next. The first method (D) employs a 
probability of occurrence table to derive an optimum code for each word of the 
frame (see Appendix B). This table is derived by observing the raw compression 
ratio for each word. The data words of the minor frame are assumed to be in- 
dependent in their occurrence although compression is done on an experimental 
basis. Hence, if an experiment’s output occurs in more than one word, then all 
words belonging to the experiment are compressed as a single source. Thus, 
each output from the source is assumed to be independent of the previous output 
from that source. 

The probability of occurrence for a word is found by: ( l/C R . ) = P Ai , where 
C R . is the raw compression ratio for word i and P Ai is the probability that word 
i occurs in a frame. Since these values represent the word activity, they do not 
sum to unity. The table may be normalized by dividing each value by 

i=i 

The derivation of the code from this table is described in Reference 24. 
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Figure 9 — Data formats for compression schemes C, D, and E 









26 


Now each word has its own label that defines it every time the word appears 
(see Figure 9b). Once the word is defined, the frame can be reconstructed, and 
proc essing is identical to current methods. 

The last method (E) attempts to find an optimum code for each experiment 
sensor, based on a probability of occurrence table for the experiment sensor 
(see Appendix C). This table is found by summing all the compressed output bits 

i 

for each sensor and dividing by the total number of bits for that sensor in the un- 
compressed data streanr yielding a raw compression ratio for each sensor. The 
probability of occurrence is then found as in scheme C, and the code identification 
derived. As each word is transmitted, as required by the compression algorithm, 
this identification tag, modified where necessary, is also sent (see Figure 9c). 

For sensors with more than one word per frame, an ambiguity can arise 
concerning which word is being sent. This problem is increased when the posi- 
tion of the sensor words in the main frame is such that the words are close to- 
gether. In order to minimize the possibility of confusing sensor outputs, addi- 
tional bits are added to the foregoing code when the possibility of confusing the 
outputs is greater than 0.6 percent. Hence the frame can again be reconstructed 
but at the expense of increasing the reconstruction program complexity. 


CHAPTER IV 


RESULTS AND DISCUSSION 

The first phase of the study shows that the ZOP achieves a higher raw com- 
pression ratio with less errors than does the LP, for most of the sensors. These 
values were derived bv subjecting the real spacecraft data output to each com- 
pression algorithm for K values, ranging from zero to ten over the same time 
period. The K values were chosen in such a manner as to provide reasonable 
error rates (less than one quantization level RMS error) while yielding good 
compression. These K values are shown in Table A1 along with the type of 
algorithms used. 

Table A2 represents the RMS error introduced for each data word in the 
minor frame. Note that data words 1, 2, 3, 97, 98, and 99 do not appear here. 
Words 1,2, and 3 are the FSP, which is not compressed, and words 97, 98, and 
99 are the subcommutator data words, which represent many different sensors. 
Also notice that certain words appear with K values of zero. These words repre- 
sent the spacecraft status, clock, binary subcommutaior counter, and certain 
experiments for which no transmission errors can be allowed. These data words 
are predictable or constant most of the time. 
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Having arbitrarily set an average error level of less than one quantization 
level, it will be assumed to be a reasonable one. This fixes the peak error rates 
for the purposes of this study by selecting K values for each sensor. However, 
the experimenter must choose these K values because he has a clearer under- 
standing of the limitations of his instruments and the usefulness of his data. 

Now the problem becomes one of choosing an identification scheme for each 
data point. For the data used, the five previously described schemes show the 
following overall compression ratios: scheme A = 5.47, scheme B = 5.58, 
scheme C = 5.97, scheme D = 5.58, and scheme E = 5.55. 

The foregoing results for schemes A and B were obtained by computer simu- 
lation. In order to compare schemes A and B with the other schemes, a theoretical 
calculation was accomplished by use of the following equation: the average number 
of identification bits for a particular word, k, equals (n. x Pj) 4000, where n i 
represents the number of bits needed to encode the fact that i words were skipped 
prior to word k, and P. is the probability of skipping i words prior to word k; p. 
i,s found from 

P i “ P Ak [ 1-P A(k-l)] t 1 ~ P A(k~2)] t 1 ~ P A(k-i/l)] [ P A(k~ i) ] 

where P Ak is the probability of occurrence for word k and is equal to 1/C k , where 
C k is the raw compression ratio of word k . The number 4000 represents the num- 
ber of minor frames over which the sample was taken. 

The rejults of this calculation show C A = 5.44 and C B = 5.55— close agreement 
with the simulated results. These values are found under the assumption that 
the probability of occurrence of a particular word is independent of the probability 
of the word just prior to it. This assumption seemed to be valid since the 
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computed compression closely agrees with the simulated ones. Since schemes 
A and B are not optimal, an optimal code was derived for the data source. The 
data for this derivation and the code appear in Appendix A. Scheme C represents 
the result of calculating the compression for this code. As expected, this scheme 
shows an improvement in compression over the nonoptimal methods. 

The hypothesis suggested earlier that schemes A, B, and C are better than 
schemes D and E is not completely valid. Although the optimum method used in 
scheme C did yield a slight improvement over schemes D and E, the improvement 
was not significant. On the other hand, schemes A andB, since they were not 
optimum, yielded a poorer compression ratio than schemes D and E. This is 
explained by the fact that the probability distribution of the number of words 
skipped was more evenly distributed than expected. 

Since much of the past work on data compression concerned time -identification 
of a single source, the last two schemes were used to compare the effects of 
carrying this work over to a multiplexed source. In scheme D , each word of the 
minor frame is treated as a data source for time -identification purposes. In 
scheme E, each sensor is identified separately in the multiplexed data stream. 

The data for these derivations and the respective codes appear in Appendix B 
for scheme B and Appendix C for scheme E. These schemes are representative 
of past study on a single-source model, and they yield about the same compression 
as schemes A and B. Further, when scheme C is considered, it is seen that 
schemes B and E are suboptimal. 

Also in scheme E, one encounters the problem of ambiguous data since a 
sensor with more than one output in a minor frame may appear twice with no 
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other sensor data separating them in the compressed data stream. Hence, when 
this source is received, one may not know which output the data represent. The 
result quoted for scheme E represents a probability of ambiguity of 0.006. It is 
assumed that this is an acceptable level of ambiguity. Table C2 represents the 
effect on compression of removing individually those sensors that have a possible 
ambiguous output. Table C2 also shows the different probabilities of ambiguity 
associated with each ambiguous sensor. 


CHAPTER V 


CONCLUSIONS 

The use of premultiplexer compression offers promise because the nature 
of the source is most prominent at this point. The need for transmitting only the 
most meaningful information is dramatized by the present and foreseen telemetry 
limitations at planetary distances. With reasonable ground facilities, an 8-foot 
antenna reflector would permit less than 50 bits per second to be transmitted to 
earth from a spacecraft at 5 astronomical units distance. Even with foreseen 
improvements in the telemetry system, this figure would only increase to a few 
thousand bits per second. Since such a telemetry system cannot be allowed to 
overload with redundant data, some form of compression will be essential. 

Two types of compression algorithms were studied, the ZOP and the LP. 
Table A1 shows that the ZOP was heavily favored (only two LP’s out of 128). 
Actually this figure is deceptive since Table A1 refers to the frame matrix, and 
several sensors appear more than once in this matrix. A more realistic figure 
is two LP’s out of 89 sensors. 

Hence, the cost of implementing an LP seems to be too great, considering 
its usage. It would be better to sacrifice compression for this reduction in cost. 
This argument is strengthened farther by the fact that both sources that use an 
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LP are binary counters and hence generate a saw-tooth waveform. Thus, they 
only need to be transmitted at their zero crossing points. It would be less ex- 
pensive to implement such a transmitting function than an LP compressor. 

Notice also in Table A1 that words 36, 101, and 107 have infinite K values. 
This is true because word 101 is a fill word in the OGO-B frame format used to 
keep the frame matrix constant; words 36 and 107 were stipulated by the experi- 
menter as useless to him at 64,000 bps transmission, which was the case here. 
Thus, an advantage of a compression system is to remove useless data from the 
transmitted data stream. 

This may seem a trivial case. Actually it is not since generally not all ex- 
periments on a spacecraft of this type are turned on at all times. The reasons 
for this are basically three-fold. First, because of the design of certain experi- 
ments and the need for saving weight in spacecraft design, the drain on the space- 
craft power supply would be too great to permit continuous operation. Secondly, 
because of the nature of certain experiments and the spacecraft orbit, it is im- 
possible to derive useful data throughout the entire orbit. 

The third but less important reason is that because of the unknown factors 
of space parameters, certain experiments, when flown for he first time, are 
found to be designed below their useful range. An example of this was Dr. Van 
Allen’s first flight package designed to measure energetic particles. Unaware 
at the time of the presence of the now famous belts, his instruments went to 
their maximum value and remained. 

Five data methods compression have been proposed. In choosing which one 
to apply, the cost of implementation versus the compression must be kept in mind. 
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Implementing schemes C, D, and E would require the development of an entirely 
new system in the analog-to-digital phase of the processing. Also, extensive 
reprogramming would have to be done to handle the compressed data in these 
forms. The spacecraft design would be very complex compared to the current 
design. Therefore, long development times with great initial cost would be 
expected. 

Schemes A and B, on the other hand, take full advantage of the current data 
processing programs since only a subprogram is needed to get back to the frame 
matrix. The frame matrix is important at present because of its usage in time- 
correlation of the experimenter's data relative to the orbit data. Also, it is 
important because existing edit and deccm programs are based on it. 

Furthermore, scheme A could be handled on existing equipment without modi- 
fication, and scheme B requires only minor modifications. Scheme C, which repre- 
sents an optimum coding scheme for schemes A and B, would require extensive 
modification. The slight gain in compression that it offers doesn't warrant its 
usage in a practical system. 

Since the initial cost of an untried data compression system is one of the 
major deterrents to its usage, schemes A and B are proposed as a practical, first 
step toward more sophisticated methods. Their moderate initial cost would allow 
studies of compression algorithms to be made on the spacecraft. Then when 
optimum data compression methods are found, a gradual change to better time 
encoding schemes could follow. Thus, the development of better ground systems 
could be spread out over years while the advantages of data compression were 
being used in a somewhat more limited sense. 
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The results show that encoding the number of samples skipped is at least 
equivalent to encoding each source when considering a multiplexed data stream. 
In the optimal case, this method is slightly better than source identification, and 
the ease of implementing it compared to identifying each source, particularly for 
schemes A and B, leads to the conclusion that this type of time identification is 
better suited to multiplexed data. 

Further, it has been shown that schemes A and B are at least equivalent to 
the more optimum methods; i.e., the improvement in compression of the optimum 
methods (C, D, E) is not significant. This fact is of major importance when it is 
considered that the latter three methods are very dependent on the probability 
distribution of the data, which may not be known in advance. 


APPENDIX A 


DERIVATION OF THE COMPRESSION RATIO FOR SCHEME C 

The data of Table A4 were obtained from the first phase of the study, the 
selection of K values and compression algorithms. The total number of data bits 
needed to be output over 4000 frames was found to be 554,274. The total number 
of data bits in 4000 frames is 4,608,000, yielding a raw compression ratio of 
8.31. 

Table A4 represents the normalized, ordered probability of skipping the 
number of words in the leftmost column. These probabilities were found from 
the equation 


128 



k=l 


where P ki equals the probability of skipping k words prior to word i ; P ki is 
found from 

P ki “ P Ai [ 1 ~ P A< i- 1) ] [ 1_P A(i-2)] ’ ” [* ” P A(i-k-l)] P Ak ’ 
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where P Ak is fche probability of occurrence for word K and is equal to 1/C k , where 
C k is the raw compression ratio of word K. 

The identification code is derived by use of the Huffman Algorithm from 
Noiseless Coding Theory. The code and the associated number of identification 
bits are in Table A4. 

The net cost of identifying the data points is found from the summation 

n i Pk i 4000 . 

i= 1 

This total is added to the number of data bits, yielding 

554,274 + 218,094 - 772,368 

total bits necessary for transmission. 

Hence, the compression is computed from 



C 


4,608,000 

772,368 


5.96 . 
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Table A1 


K Values and Compression Algorithms for the OGO-B Frame Matrix 
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Table A3 

Raw Compression and Probability of Occurrence 


for Each 


Data 

Word 

Compression 

Probability 

1 

1.00 

1.000000 

2 

1.00 

i 

1.000000 

3 ' 

1.00 

1.000000 

4 

2000.00 

0.000500 

5 

4000.00 

0.000250 

6 

ao 

0.000000 

7 

4000.00 

0.000250 

8 

8.31 

0.120300 

9 

8.06 

0.124100 

10 

9.87 

0.101317 

11 

CD 

0.000000 

12 

60.60 

0.016501 

13 

16.06 

0.062266 

14 

7.44 

0.134408 

15 

1.13 

0.884955 

16 

6.51 

0.153610 

17 

00 

0.000000 

18 

00 

0.000000 

19 

2000.00 

0.000500 

20 

210.52 

0.004750 

21 

4000.00 

0.000250 

22 

00 

0.000000 

23 

31.49 

0.031756 

24 

4000.00 

0.000250 


Data Word 


Data 

Word 

Compression 

Probability 

25 

19.13 

0.052270 

26 

7.42 

0.134770 

27 

210.52 

0.004750 

28 

5.81 

0.172100 

29 

26.84 

0.037257 

30 

4000.00 

0.000250 

31 

4000.00 

0.000250 

32 

95.23 

0.010500 

33 

CD 

0.000000 

34 

CD 

0.000000 

35 

27.39 

0.036510 

36 

00 

0.000000 

37 

8.43 

0.118620 

38 

1.17 

0.854700 

39 

6.15 

0.162600 

40 

4000.00 

0.000250 

1 

41 

9.75 

0.102564 

42 

9.85 

0.101522 

43 

4000.00 

0.000250 

44 

23.25 

0.043010 

45 

6.72 

0.148810 

46 

7.60 

0.131578 

47 

2.34 

0.4,27350 

48 

1.96 

0.510200 
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Table A3— Continued 


Raw Compression and Probability of Occurrence 
for Each Data Word 


Data 

Word 

Compression 

Probability 

Data 

Word 

Compression 

Probability 

49 

17.62 

0.056753 

73 

12.34 

0.081037 

50 

20.00 

0.050000 

74 

9.75 

0.102564 

51 

21.62 

0.046253 

75 

75.47 

0.013250 

52 

800.00 

0.001250 

76 

31.25 

0.032000 

53 

108.10 

0.009250 

77 

16.00 

0.062500 

54 

4000.00 

0.000250 

78 

43.47 

0.023004 

55 

4.11 

0.243300 

79 

30.30 

0.033003 

56 

3.48 

0.287360 

80 

7.72 

0.129530 

57 

17.31 

0.057770 

81 

CO 

0.000000 

58 

86.95 

0.011500 

82 

1000.00 

0.001000 

59 

86.95 

0.011500 

83 

00 

0.000000 

60 

13.60 

0.073529 

84 

2000.00 

0.000500 

61 

8.6 

0.116279 

85 

6.42 

0.155763 

62 

1.16 

0.862068 

86 

7.17 

0.133868 

63 

11.39 

0.087796 

87 

1.15 

0.869565 

64 

7.18 

0.139275 

88 

7.98 

0.125313 

65 

62.50 

0.016000 

89 

36.36 

0.027502 

66 

00 

0.000000 

90 

400.00 

0.002500 

67 

00 

0.000000 

91 

4000.00 

0.000250 

68 

9,66 

0,103500 

92 

4000.00 

0.000250 

69 

00 

0.000000 

93 

4000.00 

0.000250 

70 

4000.00 

0.000250 

94 

00 


71 

1333.33 

0.000750 

95 

00 


72 

800.00 

0.001250 

96 

102.56 

0.009750 











Table A3— Concluded 


Raw Compression and Probability of Occurrence 
for Each Data Word 


Data 

Word 

p 1 1 ■ 

Compression 

— 

Probability 

Data 

Word 

Compression 

Probability 

97 

4.72 

0.211800 

113 

2000.00 

0.000500 

98 

3.79 

0.263800 

114 

4000.00 

0.000250 

99 

3.99 

0.250600 

115 

19.23 

0.052002 

100 

37.73 

0.026504 

116 

285.71 

0.003500 

101 

00 

0.000000 

117 

7.44 

0.134408 

102 

57.97 

0.017250 

118 

1.13 

0.884955 

103 

6.96 

0.143673 

119 

6.75 

0.148148 

104 

19.13 

0.052273 

120 

7.85 

0.127388 

105 

10.41 

0.096061 

121 

285.71 

0.003500 

106 

8.84 

0.113122 

122 

285.71 

0.003500 

107 

00 

0.000000 

123 

285.71 

0.003500 

108 

86.95 

0.011500 

124 

266.66 

0.003750 

109 

14.38 

0.069541 

125 

181.81 

0.005500 

110 

6.67 

0.149925 

126 

800.00 

0.001250 

111 

2.39 

0.418410 

127 

166.66 

0.006000 

112 

2.03 

0.492610 

128 

16.19 

0.061766 
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Table A4 


Huffman Code for Number of Samples Skipped, Schematic C 


Number of 
Samples Skipped 

Probability 
of Skipping 

n,* 

1 

Code 

1 

0.173235868 

3 

000 

6 

0.092404574 

3 

100 

12 

0.076578412 

4 

0011 

7 

0.070025065 

4 

0101 

2 

0.067545370 

4 

0110 

10 

0.054338394 

4 

1101 

5 

0.047242803 

4 

1111 

11 

0.043795023 

4 

1010 

23 

0.041078904 

5 

00100 

9 

0.039457290 

5 

00101 

8 

0.035963183 

5 

01000 

3 

0.035715062 

5 

OHIO 

13 

0.031583676 

5 

11000 

14 

0.025718843 

5 

11100 

4 

0.023928569 

5 

11101 

25 

0.023762695 

5 

10110 

24 

0.015909861 

6 

011110 

22 

0.014325995 

6 

110010 

15 

0.014261874 

6 

110011 

18 

0.011157012 

6 

101110 

19 

0.009553423 

7 

0100100 

17 

0.008817485 

7 

0100110 

16 

0.008670054 

7 

0100111 

20 

0.005256475 

7 

1011111 

21 

0.004 6h6306 

8 

01001010 

31 


8 

01111100 


*ii£ = number of bits in the code 
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TrMe A4— Continued 


Huffman Code for Number of Samples Skipped, Scheme C 


Number of 
Samples Skipped 

i 

Probability 
of Skipping 

n * 

Code 

35 

0.002893870 

8 

10111100 

30 

0.002415339 

9 

010010110 

32 

0.002060067 

9 

011111010 

26 

0.002014548 

9 

011111011 

29 

0.001392703 

9 

101111010 

33 

0.001058470 

10 

0100101110 

27 

0.001050231 

10 

0100101111 

36 

0.000859126 

10 

0111111000 

28 

0.000857378 

10 

0111111001 

39 

0.000769617 

10 

0111111101 

34 

0.000755170 

10 

Qimimo 

37 

0.000694837 

10 

1011110110 

40 

0.000487073 

11 

01111110101 

38 

0.000461838 

11 

01111110110 

41 

0.000431112 

11 

01111110111 

49 

0.000382027 

11 

01111111110 

47 

0.000339404 

11 

01111111111 

42 

0.000257572 

11 

10111101110 

44 

0.000235083 

12 

011111101000 

48 

0.000214510 

12 

0111111.10000 

43 

0.000203769 

12 

011111110001 

50 

0.000199969 

12 

011111110010 

46 

0.000141640 

13 

1011110111010 

45 

0.000135413 

13 

1011110111011 

56 

0.000122354 

13 

0111111010010 

55 

0.000048512 

14 

01111111001100 


*n| = number of bits in the code 
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Table A4— Continued 


Huffman Code for Number of Samples Skipped, Scheme C 


Number of 
Samples Skipped 

Probability 
of Skipping 

n> 

Code 

54 

0.000037403 

15 

011111101001100 

53 

0.000034501 

15 

011111101001101 

51 

0.000028684 

15 

011111101001110 

52 

0.000026756 

15 

011111110011110 

59 

0.000026387 

15 

011111110011100 

57 

0.000014995 

16 

0111111010011110 

58 

0.000013779 

16 

0111111100111110 

62 

0.000011165 

16 

0111111100111010 

63 

0.000011157 

16 

0111111100111011 

61 

0.000010342 

16 

0111111100110100 

72 

0.000009806 

16 

0111111100110110 

64 

0.000009205 

17 

01111110100111110 

60 

0.000009205 

17 

01111110100111111 

70 

0.000006534 

17 

01111111001111110 

65 

0.000005995 

17 

01111111001101010 

'( X 

0.000005547 

17 

01111111001101110 

73 

0.000002335 

18 

011111110011010111 

66 

0.000002048 

18 

011111110011011110 

69 

0.000001878 

18 

011111110011011111 

67 

0.000001860 

19 

0111111100111111100 

68 

0.000001376 

19 

0111111100110101100 


0.000001270 

19 

011111110011010110- 

84 

0.000000838 

20 

01111111001111111101 

80 

0.000000754 

20 

01111111001111111110 

77 

0.000000495 

21 

011111110011111110100 

82 

0.000000466 

21 

011111110011111110110 


*nj = number of bits in the code 
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Table A4— Continued 

Huffman Code for Number of Samples Skipped, Scheme C 


Number of 
Samples Skipped 

Probability 
of Skipping 

n,* 

Code 

83 

0.000000424 

21 

011111110011111111000 

79 

0.000000397 

21 

011111110011111111110 

76 

0.000000270 

22 

0111111100111111101010 

78 

0.000000186 

22 

0111111100111111101110 

75 

0.000000159 

22 

0111111100111111110010 

96 

0.000000122 

23 

01111111001111111010110 

81 

0.000000111 

23 

01111111001111111011110 

95 

0.000000080 

23 

01111111001111111111100 

88 

0.000000079 

23 

oiiiimooiiiiimmoi 

90 

0.000000070 

23 

01111111001111111111110 

97 

0.000000069 

23 

01111111001111111111111 

85 

0.000000056 

24 

011111110011111110111110 

103 

0.000000053 

24 

011111110011111110111111 

94 


24 

011111110011111111001100 

89 

0.000000047 

24 

011111110011111111001101 

91 

0.000000043 

24 

011111110011111111001111 

87 

0.000000033 

25 

0111111100111111101011100 

86 

0.000000032 

25 

0111111100111111101011101 

102 

0.000000024 

25 

0111111100111111110011100 

92 

0.000000015 

26 

01111111001111111010111100 

93 

0.000000013 

26 

01111111001111111010111110 



26 

01111111001111111100111010 

108 

0.000000008 

27 

011111110011111110101111010 

109 

1 

27 

011111110011111110101111110 

101 


27 

011111110011111111001110111 

115 


28 

0111111100111111101011110110 

*i.j - number of bits in t 

bt code 
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Table A4— Concluded 


Huffman Code for Number of Samples Skipped, Scheme C 


Number of 
Samples Skipped 

Probability 
of Skipping 

n.* 

1 

Code 

107 

0.000000003 

28 

0111111100111111101011111110 

98 

0.000000003 

28 

0111111100111111101011111111 

104 

0.000000003 

28 

0111111100111111110011101100 

99 

0.000000002 

29 

01111111001111111010111001110 

" 06 

0.000000002 

29 

01111111001111111010011101111 

105 

0.000000001 

30 

011111110011111111001110110100 

114 

0.000000001 

30 

011111110011111111001110110101 

110 

0.000000001 

30 

011111110011111111001110110110 

112 

0.000000000 

34 

0111111100111111110011101101110000 

113 

0.000000000 

34 

0111111100111111110011101101110001 

116 

0.000000000 

34 

0111111100111111110011101101110010 

117 

0.000000000 

34 

0111111100111111110011101101110011 

111 

0.000000000 

34 

0111111100111111110011101101110100 

125 

0.000000000 

34 

0111111100111111110011101101110101 

119 

0.000000000 

34 

0111111100111111110011101101110110 

118 

0.000000000 

34 

0111111100111111110011101101110111 

120 

0.000000000 

34 

0111111100111111110011101101111000 

124 

0.000000000 

34 

0111111100111111110011101101111001 

122 

o.ooioooooo 

34 

0111111100111111110011101101111010 

121 

0.000000000 

34 

0111111100111111110011101101111011 

123 

0.000000000 

34 

0111111100111111110011101101111100 

126 

0.000000000 

34 

0111111100111111110011101101111101 

127 

0.000000000 

34 

oiiiiiiiooiiiiimooiiioiioiiinio 

128 

■SSI 

34 

0111111100111111110011101101111111 


*Qj = number of bits in the code 




APPENDIX B 


DERIVATION OF THE COMPRESSION RATIO FOR SCHEME D 


From the raw compression ratios of Table A3, the probability of occurrence 
table is found from the equation 

P Ai ~ 7«— for i = 4, ••• , 128 . 

Ai '-Ri 

This table is arranged in decreasing order, normalized by the factor 



and is presented in Table Bl. 

The Huffman code and associated code lengths are also presented here. The 
net cost of identifying the data points for this scheme are found by the summation 


128 



i=4 


In Table B2 the first three words do not appear since they represent the FSP, 
which was not compressed. However, these 108,000 bits are present in the raw 


47 



48 


data bit count. The result of these calculations yields 


554,274 + 270,340 =- 824,614 . 

The compression ratio is 


C 


D 


4,608,000 

824,614” “ 5.59 . 
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Table B1 


Huffman Code for the Main Frame Word Positions, Scheme D 


Main Frame 
Word Number 

Probability 
of Occurrence 

n* 

i 

i 

Code 

118 

0.071370951 

4 

1011 

15 

0.070846610 

4 

1100 

87 

0.069999597 

4 

1101 

62 

0.068991248 

4 

1110 

38 

0.068708910 


1111 

48 

0.040979309 

5 

10000 

112 

0.039668455 

5 

10010 

47 

0.034445206 

5 

00000 

111 

0.033739362 

5 

00001 

56 

0.023131529 

5 

01001 

98 

0.021276167 

5 

OHIO 

99 

0.020207317 

6 

100011 

55 

0.019602307 

6 

loom 

97 

0.017081434 

6 

101010 

28 

0.013874884 

6 

001001 

39 

0.013108539 

6 

001011 

85 

0.012564030 

6 

001100 

16 

0.012382527 

6 

001101 

110 

0.012080023 

6 

001110 

45 

0.011999355 

6 

010000 

119 

0.011938854 

6 

I 

010001 

103 

0.011575848 

6 

010100 

64 

0.0112.33009 

6 

010110 

1 

26 

0.010870004 

6 

010111 

117 

0.01^829670 

6 

011000 

14 

0.010829670 

6 

011001 


•fij - number of bits in the code 
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Table Bl— Continued 


Huffman Code fo* the Main Frame Word Positions , Scheme D 


Main Frame 
Word Number 

Probability 
of Occurrence 

n,* 

Code 

86 

0.010789336 

6 

011010 

46 

0.010607833 

6 

011011 

80 

0.010446497 

6 

011110 

120 

0.010264994 

7 

1000100 

88 

0.010103658 

7 

1000101 

9 

0.010002823 

7 

1001101 

8 

0.009700319 

7 

1010000 

37 

0.009559150 

7 

1010011 

61 

0.009377647 

7 

1010010 

106 

0.009115476 

7 

1010110 

68 

0.008349131 

7 

0001000 

74 

0.008268463 

7 

0001010 

41 

0.008268463 

7 

1 

42 

0.008187795 

7 

0001100 

10 

0.008167628 

7 

0001110 

105 

0.007744121 

7 

0010000 

63 

0.007078611 

7 

0010100 

73 

0.006534102 

7 

ooioioi 

60 

0.005929093 

7 

0011110 

109 

0.005606421 

7 

0101010 

77 

0.005041764 

8 

10001000 

13 

0.005021579 

8 

10001001 

128 

0.004981245 

8 

10100010 

57 

0.004658573 

8 

10100011 

49 

0.004577905 

8 

10101110 

104 

0.004214899 

8 

00010010 


*n- = number of bits in the code 
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Main Frame 
Word Number 


Probability 
of Occurrence 


0.004214899 
0.004194732 
0.004033397 
0.003730892 
0.003468721 
0.003004880 
0.002944379 
0.008662042 
0.002581374 
0.002561207 
0.002218368 
0.002137700 
0.001855362 
0.001391522 
0.001331021 
0.001290687 
0.001068850 
0.000927681 
0.000927681 
0,000927681 
0.000847013 
0.000786512 
0.000746178 
0.0C 0484008 
0.000443674 
0.000383173 




Code 


00010011 

00011010 

00011011 

00011110 

00100011 

00111111 

01010110 

01111100 

01111110 

oimm 

101011110 

000111110 

001000100 

010101110 

010110110 

011111011 

1010111110 

0010001010 

0010001011 

0001111110 

0011111000 

0011111010 

0101011110 

00011111110 

00011111111 

00111110010 


in the code 
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Table Bl— Continued 


Huffman Code for the Main Frame Word Positions, Scheme D 


Main Frame 
Word Number 

Probability 
of Occurrence 

n * 

Code 

20 

0.000383173 

11 

00111110110 

124 

0.000302505 

11 

01010111110 

123 

0.000282338 

12 

101011111100 

122 

0.000282338 

12 

101011111101 

121 

0.000282338 

12 

101011111110 

116 

0 000282338 

12 

101011111111 

90 

0.000201670 

12 

001111100110 

126 

0.000100835 

13 

0011111001110 

72 

0.000100835 

13 

0011111001111 

52 

0.000100835 

13 

0011111011100 

82 

0.000080668 

13 

0011111011110 

71 

0.0000 30501 

14 

00111110111010 

113 

0.000040334 

14 

00111110111011 

84 

0.000040334 

14 

00111110111110 

19 

0.000040334 

14 

00111110111111 

4 

0.000040334 

14 

01010111111100 

114 

0.000020167 

15 

010101111111010 

93 

0.000020167 

15 

010101111111011 

92 

0.000020167 

15 

010101111111100 

91 

0.000020167 

15 

010101111111101 

70 

■ 

15 

010101111111110 

54 


15 

010101111111111 

43 

0.000020167 

15 

010101111110000 

40 

0.000020167 

15 


31 


15 

010101111110010 

30 


15 

010101111110011 


i = number of bits in the code 
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Table Bl— Concluded 

Huffman Code for the Main Frame Word Positions, Scheme D 


Main Frame 
Word Number 

Probability 
of Occurrence 

n s * 

Code 

24 

0.000020167 

15 

010101111110100 

21 

0.000020167 

15 

010101111110101 

7 

0.000020167 

15 

010101111110110 

5 

0.000020167 

16 

0101011111101111 

95 

0.000000000 


0101011111101.1100000 

101 

0.000000000 

21 

010101111110111000010 

107 

0.000000000 

21 

010101111110111000011 

94 

0.000000000 

20 

01010111111011100010 

83 

0.000000000 

20 

01010111111011100011 

81 

0.000000000 

20 

01010111111011100100 

69 

0.000000000 

20 

010101111110111C0101 

67 

0.000000000 

20 

01010111111011100110 

66 

0.000000000 

20 

01010111111011100111 

36 

0.000000000 

20 


34 

0.000000000 

20 

01010111111011101001 

33 

0.000000000 

20 

01010111111011101010 

22 

0.000000000 

20 

01010111111011101011 

18 

1 

20 

01010111111011101100 

17 

0.000000000 

20 

01010111111011101101 

11 


20 

01010111111011101110 

6 

1 

20 

01010111111011101111 


*n- = number of bits in the code 







APPENDIX C 


DERIVATION OF THE COMPRESSION RATIO FOR SCHEME E 

In Table Cl, the sensors are numbered by experiment and letter. The betters 
represent different sensors of the same expe . ^ ent. In the cases where a sensor 
appears more than once in the main frame, the word position numbers of its out- 
puts appear to the left. Table Cl is arranged according to the decreasing order 
of the probability of occurrence. The probability of occurrence is found by dividing 
the bits out by the bits in; then a Huffman code is derived for each sensor. 

For those sensors with more than one output per frame, the probability of 
an ambiguous output is calculated by finding P Ak , where P Ak is as defined in Ap- 
pendix A and k is the number of words between outputs of the sensor. The Huffman 
code is then modified where necessary so that the probability of ambiguity is 
less than 0.006. These modifiers appear at the end of Table Cl. 

Table C2 represents the probability of ambiguity of multiple -output sensors 
and the effect of added coding to remove this ambiguity on the overall compression 
ratio. For the case of an ambiguity of less than 0.006, the compression ratio is 
found by the following procedure. The number of identification bits is found by 


54 


55 


the sum 



n. P. 4000 


To this sum is added the number of data bits required. In Table C2, the first 


three words representing the FSP do not appear; however, they do appear in the 


sum of the output bits. The results of these computations show 


4,608,000 

C e " 554, 274 + 275, 351 


4,608,000 

829,625 


The other compression ratios in Table C2 were computed similarly. 



Huffman Code for the Experiment Sensor, Scheme E 


56 



number of bits in the code 




Table Cl— Continued 


57 



number of bits in the code 



















Table Cl — Continued 
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number of bits in the code 




Table Cl— Continued 
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number of bits in the code 


Table Cl — Continued 




Table Cl — Continued 




Table Cl — Concluded 
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Table C2 


Comparison of Ambiguity in Scheme E 


Sensor 

Number 

Probability 
of Ambiguity 
Removed 

Number of 
Identification 
Bits Needed 

Total 
Number 
of Bits 
Output 

Actual 

Compression 

Ratio 

— 

None 

221,964 

776,238 

5.936 

10B 

0.333 

264,017 

818,291 

5.631 

18 

0.0115 

233,298 

j 

787,572 

5.850 

IOC 

0.006 

231,684 

785,958 

5.862 

10A 

0.0011 

229,608 

783,882 

5.878 

13A 

0.00006 

225,192 

779,466 

5.911 

17C 

0.00005 

225,963 

780,237 

5.906 

17B 

0.00003 

225,345 

779,619 

5.911 

17A 

0.000002 

223,089 

777,363 

5.927 

13B 

0.000008 

222,788 

777,062 

5.931 

- 

All 

300,236 

854,510 

5.392 
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